Tree Induction vs. Logistic Regression: A Learning-Curve Analysis

نویسندگان

  • Claudia Perlich
  • Foster J. Provost
  • Jeffrey S. Simonoff
چکیده

Tree induction and logistic regression are two standard o the shelf methods for building models for classi cation We present a large scale experimental comparison of logistic regression and tree induction assessing classi cation ac curacy and the quality of rankings based on class membership probabilities We use a learning curve analysis to examine the relationship of these measures to the size of the training set The results of the study show several remarkable things Contrary to prior observations logistic regression does not generally outperform tree induction More speci cally and not surprisingly logistic regression is better for smaller training sets and tree induction for larger data sets Importantly this often holds for training sets drawn from the same do main i e the learning curves cross so conclusions about induction algorithm superiority on a given domain must be based on an analysis of the learning curves Contrary to conventional wisdom tree induction is e ective at pro ducing probability based rankings although apparently comparatively less so for a given training set size than at making classi cations Finally the do mains on which tree induction and logistic regression are ultimately preferable can be characterized surprisingly well by a simple measure of signal to noise ratio

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Comparison of Gestational Diabetes Prediction Between Logistic Regression, Discriminant Analysis, Decision Tree and Artificial Neural Network Models

Background and Objectives: Gestational Diabetes Mellitus (GDM) is the most common metabolic disorder in pregnancy. In case of early detection, some of its complications can be prevented. The aim of this study was to investigate early prediction of GDM by logistic regression (LR), discriminant analysis (DA), decision tree (DT) and perceptron artificial neural network (ANN) and to compare these m...

متن کامل

Prediction of Fault-Prone Software Modules using Statistical and Machine Learning Methods

Demand for producing quality software has rapidly increased during the last few years. This is leading to increase in development of machine learning methods for exploring data sets, which can be used in constructing models for predicting quality attributes such as fault proneness, maintenance effort, testing effort, productivity and reliability. This paper examines and compares logistic regres...

متن کامل

Comparing the Results of Logistic Regression Model and Classification and Regression Tree Analysis in Determining Prognostic Factors for Coronary Artery Disease in Mashhad, Iran

Background and purpose: Understanding of the risk factors for cardiovascular artery disease, which is the leading cause of death worldwide, can lead to essential changes in its etiology, prevalence, and treatment. The aim of this study was to compare the results of logistic regression model and Classification and Regression Tree Analysis (CART) in determining the prognostic factors for coronary...

متن کامل

Models to predict cardiovascular risk: comparison of CART, multilayer perceptron and logistic regression

The estimate of a multivariate risk is now required in guidelines for cardiovascular prevention. Limitations of existing statistical risk models lead to explore machine-learning methods. This study evaluates the implementation and performance of a decision tree (CART) and a multilayer perceptron (MLP) to predict cardiovascular risk from real data. The study population was randomly splitted in a...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:
  • Journal of Machine Learning Research

دوره 4  شماره 

صفحات  -

تاریخ انتشار 2003